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We argue that imitation is a learning response to unintelligible actions, especially to social 
conventions. Various strands of evidence are converging on this conclusion, but further 
progress has been hampered by an outdated theory of perceptual experience. Comparative 
psychology continues to be premised on the doctrine that humans and non-human primates 
only perceive others' physical "surface behavior," while mental states are perceptually 
inaccessible. However, a growing consensus in social cognition research accepts the direct 
perception hypothesis: primarily we see what others aim to do; we do not infer it from their 
motions. Indeed, physical details are overlooked - unless the action is unintelligible. On 
this basis we hypothesize that apes' propensity to copy the goal of an action, rather than its 
precise means, is largely dependent on its perceived intelligibility. Conversely, children copy 
means more often than adults and apes because, uniquely, much adult human behavior is 
completely unintelligible to unenculturated observers due to the pervasiveness of arbitrary 
social conventions, as exemplified by customs, rituals, and languages. We expect the 
propensity to imitate to be inversely correlated with the familiarity of cultural practices, as 
indexed by age and/or socio-cultural competence. The direct perception hypothesis thereby 
helps to parsimoniously explain the most important findings of imitation research, including 
children's over-imitation and other species-typical and age-related variations. 

Keywords: phenomenologv, perception, symbolic culture, development, chimpanzee, social cognition, 
enculturation, comparative psychology 



INTRODUCTION 

Theories in developmental and comparative psychology have 
been undergoing drastic changes, mainly driven by unexpected 
experimental findings. Not too long ago most primatologists 
believed that non-human primates, including our closest rel- 
atives the chimpanzees, lacked the capacity of understanding 
conspecifics as other intentional agents like themselves (Tomasello, 
1999). But more recent experimental designs are revealing more 
of the actual extent of their social understanding (Call and 
Tomasello, 2008). Similarly, there has been a trend in devel- 
opmental psychology to demonstrate various aspects of social 
cognition in increasingly younger human infants by using non- 
verbal, behavior-based experimental paradigms, thereby contra- 
dicting a long-held theory that "theory of mind" first emerges 
around 4 years of age (Baillargeon etal., 2010). These dis- 
coveries have stimulated an ongoing discussion about which 
theories can account for these new developmental and com- 
parative data (Hutto etal, 2011). Following Gallagher, we 
agree that they best fit with recent theoretical developments 
in social cognition research that is focused on active per- 
ception, embodied cognition, and phenomenology (Gallagher 
and Povinelli, 2012). We contribute to these changes by crit- 
ically evaluating current theories of imitation in comparative 
psychology. 



Imitation is one type of social learning in which both the 
form and goal of a modeled action is acquired by an agent from 
another social being (e.g.. Whiten and Ham, 1992; Hoppitt and 
Laland, 2008). Examples of social learning are widespread among 
animals: for instance, young vervet monkeys will emit "eagle" 
alarm calls to almost any airborne object, including falling leaves, 
and with development the response is gradually tuned to air- 
borne predators only (Seyfarth and Cheney, 1986). There are a 
number of ways in which one organism can influence the prob- 
ability of another organism displaying a response. One animal, 
for example, might be foraging in a particular location, which 
draws the attention of another animal to that location (local 
enhancement). The mere presence of a conspecific might trig- 
ger certain responses; for instance, the probability of re-caching 
food by scrub jays and ravens increases in the presence of a conspe- 
cific observer (Emery and Clayton, 2001; Bugnyar and Heinrich, 
2005). In this article, we are concerned with imitation; in par- 
ticular, how the saliency of the observed action's goal versus the 
saliency of its physical means influences the fidelity of the imi- 
tative response (for a general review of social learning see, e.g., 
Hoppitt and Laland, 2008). Although imitation has been explored 
in a wide variety of animal species, including dogs (e.g., Fugazza 
and Miklosi, 2013), rats (e.g., Heyes etal., 1992), and pigeons 
(e.g., McGregor et al., 2006), due to space limitations here we wiU 
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focus primarily on research involving non-human and human 
primates. 

DEVELOPMENTS IN IMITATION RESEARCH 

The specific unintelligibility of conventional practices, such as cus- 
toms and language, has been emphasized decades ago (Tomasello 
et al., 1993) and continues to be noted (Gergely and Csibra, 2006). 
But research into imitation has so far failed to directly relate such 
differences in the intelligibility of actions to qualitative differences 
in the experience of observing them. Instead, it is claimed that 
all actions are equally perceived as nothing but physical motions, 
thus requiring mentalistic inferences about their intentions, with 
some actions being more cognitively opaque than others (Csibra 
and Gergely, 2006). We argue that this neglect of how different 
categories of action are actuaOy experienced, rather than intellec- 
tually assessed, has long resulted in an overemphasis of the role of 
inferential reasoning in imitation and of imitation's role in copying 
instrumental actions. Many studies have investigated variations in 
the fidelity of copying tool-based actions while neglecting imita- 
tion's role in communicative and other social practices. But this is 
starting to change on the basis of recent findings in comparative 
and developmental psychology. 

Surprisingly, it was found that children frequently imitate 
instrumental actions even if they are clearly causally unnecessary 
to achieve the goal of the demonstrator, thus exhibiting so-called 
over-copying (Whiten etal., 2005b) or over-imitation (Lyons etal., 
2007). For example, one 3-year-old child twisted a non-functional 
pin 161 times after seeing a demonstrator twist the same pin only 
16 times, in a study by Whiten etal. (1996). Over-imitation has 
been consistently documented for children, but not for young and 
older chimpanzees (Nagell etal., 1993; Horner and Whiten, 2005; 
Whiten et al., 2005b; but see Hobaiter and Byrne, 2010), it is more 
consistent in 5-year-old than in 3-year-old children (McGuigan 
et al., 2007), and it has been demonstrated in a cross-cultural con- 
text (Nielsen and TomaseUi, 2010). Children can correctly identify 
the irrelevant actions, for example as being "siUy"; they do not 
copy them merely to please the demonstrator, and they wiU even 
continue copying them despite explicit instructions to the contrary 
(Lyons etal, 2007). 

Given these puzzling findings it may seem that human infants 
develop "towards more 'mindless' blanket copying" (Whiten et al., 
2009, p. 2427). However, there is a growing consensus that over- 
imitation is actually a rational learning strategy of a specific class 
of behaviors. While attempts to relate this phenomenon to causal 
learning of complex tool-use persist (Lyons etal., 2011), there is 
increasing evidence that its main purpose is the acquisition of 
behavioral norms that are based on arbitrary social conventions. 
Children give conventional explanations of their over-imitated 
actions, "implying that what they did was prescribed (e.g., 'I had to 
do it how they showed me' or 'I had to do it the way they did it')" 
(Herrmann et al., 2013, pp. 540-541). They protest if others fail to 
over-imitate (Keupp et al., 2013), and they will continue protesting 
even after they have seen others succeed at realizing the same goal 
while omitting the causally unnecessary action (Kenward, 2013). 

Accordingly, even researchers who have extensively argued that 
imitation's primary function is to enable children's social learn- 
ing of complex instrumental actions (e.g., Csibra and Gergely, 



2006) are forced to modify their theories to incorporate a more 
significant role of normativity in tool-use (Kiraly etal., 2013). 
Nevertheless, we argue that this continuing focus on causal learn- 
ing of cultural artifacts is a bias derived from our own modern 
science- and technology-saturated cultural environment. For most 
of hominid evolutionary history, others' intentions of tool- use and 
tool-creation probably were relatively self-evident to conspecifics, 
especially because for around two million years technological 
development proceeded at the pace of biological evolution itself 
(Ambrose, 2001). 

Even unfamiliar instrumental actions can largely be under- 
stood in a contextually constrained manner due to the causal 
necessity of using certain actions to achieve some goal, given the 
circumstances. 

Unfamiliar symbolic actions, on the other hand, tend to be 
utterly opaque because their underlying means are not determined 
by causal necessity, but by historically contingent social norms. 
The conventionally constrained relationship of signifier and signi- 
fied can be completely arbitrary and therefore must by necessity be 
acquired by "blind" imitation or pedagogy. To be sure, advanced 
technologies can reach similar levels of opacity, but when such 
techniques first developed in the Middle Stone Age, for exam- 
ple the manufacture of compound adhesives, the prerequisites of 
symbolic cognition were likely already in place (Wadley, 2011). 
It is therefore possible that it was an increase in social norms 
in early hominid societies, which first necessitated an improved 
capacity for faithful imitation, while the improved transmission 
of advanced instrumental techniques was a beneficial side-effect. 
We wOl return to the question of the origins of human imitation 
at the end of this article. 

The crucial qualitative differences between perceiving contex- 
tually constrained and conventionally constrained actions have 
long been ignored because, as we will argue in more detail below, 
the theory of perception standardly employed in comparative psy- 
chology is misguided. Phenomenologists, on the other hand, have 
begun to remind scientists that the intentions of most observed 
actions, including instrumental actions, are directly perceivable by 
others - without the necessity of having to overcome any kind of 
opacity by engaging in mentalistic or behaviorist inference (Gal- 
lagher and Povinelli, 2012). This direct perception of intention or 
meaning makes the perception of the underlying physical details 
difficult - unless the other's action is unintelligible to the observer, 
for example because it is an utterance in an unfamiliar language. 
This basic phenomenological insight, supported by a variety 
of psychological evidence that is reviewed further below, has 
important theoretical consequences for comparative psychology. 

It has been widely recognized that humans are cultural ani- 
mals and that it is adaptive for children to become enculturated as 
fast as possible, but it has remained puzzling how they can learn 
general norms from "single observations of tokens of the action" 
(Rakoczy and Schmidt, 2013, p. 20), especially because it seems 
that "unfortunately for children, information available in the envi- 
ronment does not come tagged as being cultural" (Diesendruck 
and Markson, 2011, p. 189). This puzzle has led to an emphasis 
of the role of ostensive cues and pedagogy in human imitation 
(Gergely and Csibra, 2006; Kiraly et al, 2013). However, although 
others can provide guidance, this help is not required for one-shot 
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learning of norms. Children will interpret one unnecessary action 
as conventionally constrained as long as it is performed intention- 
ally (Schmidt etal., 2011). This ability is less surprising from the 
phenomenological perspective because novel conventionally con- 
strained actions are indeed to some extent "tagged" as such when 
perceived by an observer to whom they are unfamiliar. They are 
directly perceived as intentional actions, yet are simultaneously 
seen as causally unnecessary and unintelligible. Importantly, this 
unintelligibility also facilitates faithful imitation because percep- 
tion of an action's meaning and perception of its physical means 
are co-dependent processes, which mutually exclude each other 
from focus, similar to the relationship between figure and ground. 
Lack of perceived meaning therefore makes the underlying phys- 
ical means more salient. Of course, not every intentional action 
that a child perceives to be non-sense in this way is an unfamiliar 
norm-governed action, so there will be false positives, especiaDy 
in artificial experimental situations - precisely what has become 
known as over-imitation. 

In summary, we suggest that children's over- imitation is a 
highly selective action by which they specifically pick out those 
adult actions whose meanings are perceptually unintelligible, and 
which are therefore most likely determined by social convention. 
The phenomenological claim that this unintelligibility is mani- 
fested as a perceptual opacity, rather than as a cognitive opacity 
(Gergely and Csibra, 2006), is supported by a variety of evidence, 
including the fact that rational imitation can be affected by modu- 
lating the perceptual salience of the observed action (Beisert et al., 
2012). More generally, it has been repeatedly demonstrated that 
how we understand others' actions modulates our perceptual expe- 
rience of the underlying physical details (Teufel etal., 2010). This 
and related evidence is discussed in more detail further below. We 
therefore propose that children's over- imitation is best understood 
as a special instance of a more general inverse correlation between 
an individual's propensity to imitate an action and that action's 
perceptual intelligibility. We suggest that this general inverse cor- 
relation is found across primates, but that humans have become 
adapted to take advantage of it in the service of more effective 
enculturation during their development. 

DEVELOPMENTS IN SOCIAL COGNITION RESEARCH 

Theoretical developments in our understanding of social cog- 
nition provide new perspectives for explaining discoveries in 
comparative and developmental psychology (Racine and Carpen- 
dale, 2007; Gallagher and Povinelli, 2012). There are at least two 
important developments. First, there is the hypothesis of embod- 
ied cognition (HEC), which proposes that cognition is primarily 
embodied and interactive, such that real-time bodily interactions 
between two or more people can be partially constitutive of some 
social cognitive processes (see, e.g., De Jaegher and Di Paolo, 2007; 
Krueger, 2011; Froese and Gallagher, 2012). Second, there is the 
hypothesis of direct perception (HDP), which proposes that per- 
ceptual experience primarily is a process of directly revealing or 
disclosing the meaning of the perceived (Gallagher, 2008a; Zahavi, 
2011). There are two complementary aspects to the HDP. 

On the one hand, the HDP implies that when we perceive a 
part of our physical environment, we directly perceive the mean- 
ing it has for us. For example, we perceive an object in terms of its 



implicit affordances for interaction (Noe, 2004), and these affor- 
dances are shaped by our social context including norms of usage 
(Gallagher, 2008b). On the other hand, the HDP makes a specific 
claim about how we perceive other people, namely as other agents 
with mental lives like ourselves. Their bodily presence is encoun- 
tered as an affordance for social interaction (Krueger, 2012). 
Moreover, biologically constrained bodily expressions, contextu- 
ally constrained tool-use, and familiar conventionally constrained 
practices are directly perceived as intentional and goal-directed. 
Not aU intentions are perceptually transparent to the same degree, 
and some actions can be deceptive, but in everyday social encoun- 
ters our direct perceptual insight is often sufficient such that 
reflection about the other's beliefs and desires is not necessary 
for successful social interaction (Ratcliffe, 2007). 

Even though the HEC and the HDP can be defended indepen- 
dently, they make good theoretical complements. If the HEC is 
correct that aspects of social cognition can sometimes be directly 
realized in embodied social interaction with others, and more gen- 
erally that cognition can be directly constituted by our embodied 
comportment in the world, then the HDP becomes less mysteri- 
ous. The mental lives of others are perceptually accessible because 
their minds are not hidden inside their brains but embodied and 
realized in their actions. Other people's minds are seen in their 
worldly comportment (Krueger, 2012), and are experienced dur- 
ing social interaction (De laegher, 2009). This is especially true of 
basic emotions (Stout, 2012), but it can also hold for aspects of 
the classic belief-desire psychology (e.g., I directly perceive that a 
pupil believes she has failed the exam in her withdrawn expression 
and slumped posture - no additional explicit thought process is 
required on my part).^ 

OVERVIEW OF THE ARGUMENT 

The rest of this article unfolds in four stages. First, we critically 
examine the theory of perception that has traditionally informed 
comparative and developmental psychology and show that its log- 
ical consequences do not easily fit with the empirical findings 
of current imitation research. Then we briefly review evidence 
from phenomenology and psychology to independently motivate 
the acceptance of a more adequate theory of perception, specif- 
ically the HDP. We then argue that the logical consequences of 
this hypothesis fit better with what is generally known about imi- 
tation, and apply the hypothesis to clarify central issues in the 
debate about the development and evolution of imitation. 

LOGICAL CONSEQUENCES OF THE STANDARD THEORY OF 
PERCEPTION FOR IMITATION 

Cognitive science has traditionally treated perception as a separate 
input stage that is independent of the rest of the cognitive sys- 
tem. Perception is a form of information processing that converts 
external physical stimuli into internal mental representations (a 



^ There may be other ways of defending the phenomenological insights of the HDP 
apart from the HEC, for example in terms of subpersonal versions of Simulation 
Theory (Gallese, 2005) and Theory Theory {Ciopnik, 1 993). But what is more impor- 
tant here is that defenders of these competing theories are nevertheless in general 
agreement that the HDP makes a valid description of our personal-level social 
phenomenology (Spaulding, 2010). 
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transductive process) to be used by the cognitive system for rea- 
soning about the current state of the world and hence what to 
do next. According to this view -we will call it the hypothesis of 
physical perception (HPP) - cognition is entirely contained within 
the transductive envelope of perception, and perception primarily 
provides agents with a detailed set of facts about the external envi- 
ronment as it is conceived of by classical physics (e.g., the position, 
volume, velocity, etc., of distinct objects). If the HPP is taken as 
the theoretical starting point we end up with the following logical 
deduction about social cognition (Froese etal., 2012a): 

HPP-(l): The intentional actions of other agents can only be 
perceived as abstract physical motions in objective space. 

HPP-(2): Given HPP-(l), the other's agency and intentions are 
not directly observed via perceptual experience^. 

HPP- (3): Given HPP- (2), the other's agency and intentions are 
secondarily derived and attributed to the observed phys- 
ical motions by means of additional social cognition 
(e.g., a "theory of mind mechanism"). 

The starting premise of this deduction, HPP-(l), is typically 
phrased in the literature in terms of the metaphorical contrast 
between a surface and its hidden content. For example, it was 
once hotly debated whether chimpanzees can understand that con- 
specifics have minds of their own, given that perception can only 
provide access to "surface-level behavior," and given that this social 
ability would require them to somehow go "beneath the surface" 
(Call and Tomasello, 2008, p. 187). Since it is assumed from the 
beginning that perception cannot do this job (HPP-2), but there is 
evidence of social understanding in chimpanzees, it is necessary to 
postulate another cognitive process (HPP-3). A similar process of 
reasoning is often applied to the social understanding of human 
children and adults. It follows that HPP-(l) is foundational to 
the concept of "theory of mind" in comparative and developmen- 
tal psychology (Froese etal., 2012a). We find explicit claims to 
this effect by leading experts throughout the whole history of the 
cognitive sciences. 

In saying that an individual lias a tlieory of mind, we mean that the 
individual imputes mental states to himself and to others (either to 
conspecifics or to other species as well). A system of inferences of 
this kind is properly viewed as a theory, first, because such states are 
not directly observable, and second, because the system can be used to 
make predictions, specifically about the behavior of other organisms. 
(Premack and Woodruff, 1978, p. 515; emphasis added) 

normal children give elaborate verbal descriptions of the unobservable 
psychological states of people, indicating that they relate observable 
actions to underlying mental states. (Meltzoff, 1995, p. 838; emphasis 
added) 

Generally, the observable behavior of individuals is never transpar- 
ent either in respect to the background knowledge that governs their 
actions or in respect to the ultimate goal of the action (if it were trans- 
parent, cognitive psychology would not exist as a scientific discipline; 
Csibra and Gergely, 2006, p. 252). 



^We use the term "perceptual experience" rather than the more common "per- 
ception" because the latter is sometimes interpreted as referring to a subpersonal 
process, while we are specifically interested in describing the subject's own first- 
person perspective on the world. Although we sometimes use the term "perception" 
out of convenience, it should always be understood in this personal-level sense. 



This classic dualism between bodily behaviors and mental states 
continues to inflect and bias the debate in comparative psychology 
(Racine and Carpendale, 2007). For example. Call and Tomasello 
(2008, p. 189) concluded that "chimpanzees, like humans, under- 
stand the actions of others not just in terms of surface behaviors 
but also in terms of the underlying goals, and possibly intentions, 
involved" (emphasis added). In accordance with the HPP, it is 
claimed that "the goals and perceptions of others are not readily 
observable, and so require inferences" (Tomasello, 2008, p. 176; 
emphasis added). The possibility that social understanding is a 
direct perceptual achievement in most normal situations is thereby 
excluded by definition. 

This limited view of perceptual experience has important impli- 
cations for how researchers in developmental and comparative 
psychology approach the phenomenon of imitation. We can 
deduce a couple of predictions about what would happen when 
an agent, who is operating according to the principles of the HPP, 
intends to replicate the observed behavior of another agent. 

HPP-(4): Given HPP-(2), an agent's replication of observed behav- 
ior is primarily guided by the other's abstract physical 
motions in objective space and its causal consequences 
on the environment. 

HPP- (5): Given HPP- (3), an agent's replication of observed behav- 
ior can also be guided by the other's intention, but only to 
the extent that the observer has the additionally required 
social cognitive capacity. 

The ways in which the replication of an observed behavior 
is guided (with or without goal understanding) and performed 
(copying means or ends) have been differentiated in the liter- 
ature. In the absence of any goal understanding, a replication 
of the means of an observed action is typically referred to as 
mere "mimicry," whereas a replication of the means that is also 
guided by an understanding of the other's goal is called "imita- 
tion" (Tomasello etal., 1993). Nevertheless, this terminology is 
not consistently applied in the literature since the phenomena of 
neonatal imitation and children's over-imitation arguably do not 
involve an understanding of the other's goals, and should therefore 
be classed as forms of mimicry. 

A replication of the effects of an action, but by other means 
than those observed, is not called imitation but "emulation." The 
role of goal understanding in emulation is controversial (Huang 
and Charman, 2005 ). It was initially proposed that emulation lacks 
goal understanding, since the replication of the results could be 
based on the observed results alone (Tomasello etal, 1993). But 
evidence demonstrating that 18-month-old children re-enact and 
complete the goals of incomplete or failed actions suggests they 
employ goal emulation (Meltzoff, 1995), as does evidence that 14- 
months-olds and enculturated chimpanzees emulate more often 
when the reasons for the movements are clear to them (Gergely 
et al, 2002; Buttelmann et al., 2007). 

Thus, both the replication of means and ends may involve 
(and not involve) goal understanding. According to the HPP, the 
physical means and the physical outcomes of an observed action 
are both given in perceptual experience, thus seemingly making 
imitation easy. On the other hand, the means used to emulate 
an observed result are by definition different from the means of 
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the perceived action (otherwise it would be imitation), thereby 
requiring a creative response so as to avoid imitation. Accordingly, 
it seems that the received theory, which holds that imitation is 
comparatively rarer because it is more complex than emulation, 
is problematic (Call and Carpenter, 2002). Indeed, following the 
logic of the HPP, we end up with precisely the opposite conclusion, 
namely that exact copying of means is less complex and should 
therefore be the more common form of replication. Mimicry of 
observed actions is always possible without additional physical or 
social cognition (i.e., reasoning about causal relations or goals), 
whereas emulation always requires additional physical cognition 
to devise alternative means - with or without social cognition 
(although goal understanding would certainly help to inform 
the creative process). The HPP-based theories thereby arrive at 
a puzzling prediction: 

HPP-(6): Given HPP-(4) and HPP-(5), copying the means of 
an observed action is cognitively less demanding than 
emulating its intended results. 

This is an odd prediction because extensive research in com- 
parative psychology tells us that precisely the opposite should be 
the case. Faithful imitation is a much less common skill than 
emulation - some have even argued that it is limited to humans 
(Tomasello, 2008). But if imitation is so simple, why do non- 
human primates not simply copy what they perceive? The received 
view has formulated two responses. 

Two decades ago it was still widely accepted that chimpanzees 
imitate less than humans because they lack the required social 
cognitive processes. For chimpanzees "the intentional states of the 
demonstrator [are] either not perceived or irrelevant," whereas for 
humans, "the goal or intention of the demonstrator is a central 
part of what they perceive" (Tomasello, 1996, p. 331)^. This initial 
theory had to be revised after experimental evidence showed that 
apes understand that others have goals and behave toward them 
according to what they perceive. Instead it was claimed that apes 
still lacked an understanding of the "more mental dimensions of 
intentional action [. . .] - specifically those that have to do with 
the decision-making process by which the actor generates action 
plans and, based on a rational assessment of reality, chooses one 
to enact in intentional action" (Tomasello etal., 2005, p. 685). 
However, this theory also had to be rejected because of growing 
evidence that chimpanzees have a range of social skills, includ- 
ing an understanding of others' goals (Tomasello etal, 2003; Call 
and Tomasello, 2008). In particular, there is evidence for rational 
imitation in enculturated chimpanzees, i.e., the fact that chim- 
panzees are more prone to imitate those aspects of observed action 
sequences that appear to be intentionally made but whose causal 
reasons are not self-evident (Butteknann et al., 2007). 

The current verdict is therefore that "[a]pes understand that 
others have goals and perceptions and how these relate to one 
another in intentional action, perhaps even rational action" 
(Tomasello, 2008, p. 177). All of this undermines the original 
hypothesis that the propensity to imitate is positively correlated 



Tomasello here seems to explicitly acknowledge that humans can directly perceive 
the goals and intentions of others, as proposed by the HDP. However, to our knowl- 
edge he does not further develop the implications of this phenomenological insight 
in his theories. 



with an understanding of other minds. Indeed, this should not 
come as a surprise since even adult humans - presumably having 
the most sophisticated social skills of all animals - imitate sig- 
nificantly less than human infants (Horowitz, 2003). It therefore 
seems that other factors must be in play. 

This leads us to the second response to this dilemma, which 
accepts that non-human primates have intentional understanding 
and that imitation should be more common among non-human 
primates. For example, "mirror" neurons were first discovered 
in macaques, and a widely accepted interpretation of their func- 
tion holds that intentional understanding is based primarily on 
a mechanism that directly matches the sensory representation of 
the observed actions with one's own motor representation of those 
same actions (Rizzolatti and Sinigaglia, 2007). Bodily mirroring 
can also be an emergent outcome of the coordination dynamics 
of social interaction (Froese et al., 2012b). Given that such "motor 
mimicry" is assumed to be an automatic response, and given that 
it is assumed that perception delivers the requisite physical details, 
it is surprising that monkeys (and primates in general) do not imi- 
tate each other all the time. Consequently, some researchers have 
turned their interests toward explaining the neural mechanisms 
of the active inhibition and control of, rather than initiation of, 
imitation (e.g., Brass etal., 2009; Rumiati etal., 2009). According 
to these researchers an individual requires intentional control to 
ensure that their imitation is goal-directed rather than compul- 
sive (Heyes, 2009). Evidence of deferred imitation in enculturated 
chimpanzees supports the claim that imitation is not a mere reflex 
response, but rather an action that is under intentional control 
(Bjorklund etal, 2002). 

But this response simply brings us to another version of the 
same conundrum: why do apes not disinhibit imitation more 
often, especially if they apparently have sufficient intentional con- 
trol to inhibit its execution unless that imitative action suits their 
goals? Following this response, the empirical data is stiU rather 
puzzling because it seems to indicate that non-human primates, 
for no apparent reason, reliably fail to disinhibit their existing auto- 
matic imitative responses even when their intentionally directed 
emulation consistently fails to bring about the desired ends. For 
example, one study of captive chimpanzees demonstrated that they 
will continue begging for food in their usual, but evidently unsuc- 
cessful, manner without trying to copy the successful begging 
gestures of their specially trained conspecifics (Tomasello etal., 
1997). We will discuss this negative finding in more detail later on. 

And if non-human primates in general seem to exhibit too 
much active inhibition of imitation, then why do human children 
show too little of it, as suggested by their "surprisingly unselec- 
tive over-imitation"' (Whiten etal, 2009, p. 2417)? For example, 
surely children have good enough physical cognition to quickly 
learn when turning a pin is in fact a non-functional motion, so 
why repeat that observed motion over a hundred times (Whiten 
etal., 1996)? Equally surprising, from this theoretical point of 
view, is the fact that imitation by adults is more like that of chim- 
panzees. Lieven and StoU (2013) reported imitation in only the 
youngest children of their two-culture sample. Similarly, Horowitz 
(2003) found that human adults, like chimpanzees on a similar 
task, were significantly more likely to emulate than to precisely 
imitate actions that were demonstrated in the opening of an 
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"artificial fruit." And this happened even though the participants 
later claimed that they had believed themselves to be imitating 
all along. If they had indeed intentionally tried to disinhibit their 
automatic imitative response, why did they end up emulating the 
demonstration? 

This tension between (a) the a priori assumption that percep- 
tual experience provides a detailed mental representation of the 
physical environment, including of the motions of other agents, 
and (b) the empirical prevalence of emulation over imitation, 
warrants a reconsideration of the general validity of the HPP. 
Maybe mental states are not as perceptually hidden, and physical 
details not as perceptually evident, as has been hitherto assumed 
in comparative psychology. 

EVIDENCE FOR THE HDP FROM PHENOMENOLOGY AND 
PSYCHOLOGY 

It may be argued that we cannot assess whether the HDP better 
accounts for the point of view of non-human primates because 
we cannot know what it is like for them to perceive the world. 
However, while we can never be absolutely certain about another 
agent's first-person experience (Nagel, 1974), this kind of certainty 
is not required for doing science. It is stOl possible to motivate a 
more general acceptance of the HDP by realizing that it accurately 
describes our own point of view, and by demonstrating that its 
validity can be indirectly confirmed on the basis of its behavioral 
consequences. 

EVIDENCE FROM PHENOMENOLOGY 

Phenomenologists have long emphasized that we normally 
experience ourselves to be embodied in meaningful situations 
(Heidegger, [1927] 1962; Merleau-Ponty [1945] 2002). There is 
a consensus that under normal circumstances we directly per- 
ceive other persons as being intentional agents in their own right, 
and that much of others' psychological states is immediately per- 
ceivable in the way in which they comport themselves in the 
world, especially in social interaction (Ratcliffe, 2007; Gallagher, 
2008a; Zahavi, 2011). We cannot here review the vast litera- 
ture of phenomenology as it pertains to other people (but see, 
e.g., Gallagher and Zahavi, 2008). Instead we provide an illustra- 
tive example of the phenomenological analysis of the perception 
of another person's expressions. As Scheler ([1923)2008)) once 
remarked, although it may appear self-evident to "intellectualist" 
(i.e., cognitivist) theories that we perceive nothing of another per- 
son apart from their physical body and its objective movements in 
space, it only requires the simplest reflection about our own lived 
experience to show that there is nothing self-evident about this. 

For we certainly believe ourselves to be directly acquainted with another 
person's joy in his laughter, with his sorrow and pain in his tears, with 
his shame in his blushing, with his entreaty in his outstretched hands, 
with his love in his look of affection [...]. If anyone teUs me that this is 
not 'perception, for it cannot be so, in view of the fact that a perception 
is simply a 'complex of physical sensations', [. . .] I would beg him 
to turn aside from such questionable theories and address himself to 
the phenomenological facts. (Scheler, [1923]2008, p. 260, emphasis 
added). 

This insight about our direct perceptual experience of other 
people is not limited to the phenomenological tradition of phi- 
losophy. Similar descriptions can also be found in the analytic 



tradition, for example as famously expressed by Wittgenstein (see 
also, e.g., McDowell, 1982). 

"We see emotion." - As opposed to what? - We do not see facial con- 
tortions and make the inference that he is feehng joy, grief, boredom. 
We describe a face immediately as sad, radiant, bored, even when we 
are unable to give any other description of the features. - Grief, one 
would like to say, is personified in the face. (Wittgenstein, quoted in 
Overgaard, 2007, p. 128.) 

The notion that the meaning of an expression or gesture can 
be directly perceived, while physical features are relatively inacces- 
sible or absent, lies at the core of our hypothesis. Even a person's 
whole physical individuality can be hidden behind their perceived 
meaning. For example, Gurwitsch ([1931] 1979) has observed that 
other people are often primarily encountered in terms of their 
social roles, and that their role partially constitutes the meaning 
of a situation. He therefore remarks that individual role-bearers 
can be substituted for each other without much disruption to a 
social understanding of the situation, since "only in this role do 
I have something to do with him. In this situation, his being is 
exhausted in the role whose bearer he is" (Gurwitsch, [1931] 1979, 
p. 108). Of course, other people only appear as completely defined 
by their social roles in some generic kinds of social situation, such 
as explaining directions to a stranger (as we will see in more detail 
below), handing your ticket to a train conductor, etc. More would 
need to be said about the ways in which others are experienced as 
individual people (Ratcliffe, 2007, pp. 58-84). 

It is only during serious forms of psychopathology that the 
world and other people are experienced as nothing but a jum- 
ble of meaningless objects and mindless automata (Stanghellini, 
2004). In such unfortunate cases the observer is forced to engage 
in explicit reflection and inference-making about the meaning- 
less observed movements of others (thereby effectively creating a 
personal-level theory of mind^) in order to compensate for the 
lack of direct perceptual insight into the intentions and mean- 
ings of even the most basic kinds of actions (Froese etal., 2013). 
Indeed, accounts written by sufferers of schizophrenia give us 
phenomenological insight into what it could be like to only per- 
ceive the meaningless "surface" behavior of another person, as is 
assumed by the HPP. Consider the following description by a girl 
who lost her ability to directly perceive others' embodied mind- 
edness, such that she was confronted by an unbearable perception 
of another's body as some kind of inanimate physical machine, 
rather than as an expressive body of another person in their own 
right: 

I saw her eyes, her nose, her lips moving, heard her voice and under- 
stood what she said perfectly, yet I was in the presence of a stranger 
[. . .] She seems more a statue than ever, a manikin moved by a mech- 
anism, talking like an automaton. It is horrible, inhuman, grotesque. 
(Renee, quoted in Sechehaye, 1970, pp. 36-38.) 



^Mainstream opinion holds that many forms of psychopathology are caused by an 
absence or malfunction of "Theory of Mind", whereby the latter is defined as the 
process of making of inferences about other people's states of mind on the basis 
of their physical movements. However, the reverse seems to be more correct, i.e., 
that because these psychopathologies involve a disruption of our normal direct 
perceptual insight into others' meaningful expressions, patients are confronted with 
nothing but others' physical movements and must therefore theorize about their 
hidden meanings - with limited success (Froese et al., 2013). 
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This patient's description makes it painfully clear that the HPP 
mischaracterizes our normal perceptual experience of other per- 
sons. We normally do not perceive others in terms of only their 
surface behavior, such as the mere movements of an automaton's 
body parts; we normally directly perceive others as living, inten- 
tional agents just like ourselves, who act for reasons rather than 
merely mechanically - and we realize this without any extra need 
for cognizing. 

EVIDENCE FROM PSYCHOLOGY 

Some scientists may find these phenomenological insights too 
anecdotal to be taken seriously, but there are a number of exper- 
imental paradigms in psychology that also support the HDP and 
that undermine the guiding assumption of the HPP, i.e., that we are 
normally presented with a highly detailed physical environment in 
our perceptual experience. We begin by highlighting evidence that 
the perceived meaning of an object partially hides its physical char- 
acteristics. We then discuss evidence that the perceived meaning 
of a situation can even hide whole objects and personal identities. 

First, experiments in categorical perception have repeatedly 
demonstrated that our perceptual experience is shaped by a so- 
called "perceptual magnet effect," which implies that the structures 
of our experience are partially constituted by our learned concep- 
tual categories (Harnad, 2003). Certain physical details will be 
more or less accessible to experience depending on the categories 
by which we make sense of the world. Regarding social perception 
it has been demonstrated that our conceptual categories influ- 
ence how we perceive others' vocalizations (Iverson etal, 2003) 
as well as their facial expressions (Kotsoni et al., 2001). For exam- 
ple, discriminating others' facial expressions within a meaningful 
category of emotion is more difficult than discriminating them 
across different categories, even if they differ by an equal physical 
amount (Etcoff and Magee, 1992). Specifically, given a computer- 
generated continuum of facial expressions from happy to sad, it is 
more difficult to tell apart two images of happy faces (or sad faces) 
than to differentiate between two images of faces that express an 
undefined feeling between happy and sad. Anticipation of emo- 
tions can also lead to the misperception of facial details (Palumbo 
and Jellema, 2013). 

Systematic cultural differences in perception and social cogni- 
tion have also long been reported by ethnographers (e.g., Lillard, 
1998; Vinden, 1999; Boesch, 2007; Henrich etal, 2010). In psy- 
chology there is a field of study dedicated to elucidating how the 
natural and socio-cultural context of the perceiver shapes their 
experience, including their susceptibility to illusions (Caparos 
et al., 2012). There is also growing evidence that believing others to 
be intentional agents has top-down effects on perception, such as 
modulating how their physical movements are perceived (Moore 
etal., 2013) and on mechanisms of attentional selection (Wiese 
et al., 2012). The fact that there is a co-dependence between basic 
sensory processing of others' physical characteristics and higher- 
level social understanding of others has been taken to support 
Wittgenstein's observation that we experience ourselves as directly 
seeing other people's emotions, intentions, and attention (Teufel 
etal, 2010). 

Since it may be difficult to intuitively grasp what it means to 
fail to notice physical details when perceiving another's body, the 



reader is encouraged to experience this effect from her own first- 
person perspective. We therefore reproduce the "Thatcher illusion" 
(Thompson, 1980), which is particularly relevant for generaliz- 
ing these kinds of findings to comparative psychology, because it 
has been demonstrated to apply to the perception of non-human 
primates as well, including chimpanzees and, to a lesser extent, 
monkeys (Nakata and Osada, 2012; Weldon etal, 2013). The per- 
ceiver sees two seemingly similar pictures of a smiling face when 
these pictures are positioned upside down, but not when they are 
turned to their upright position (Figure 1). 

The effects of the Thatcher illusion are still relatively subtle, but 
"change blindness" (Simons and Rensink, 2005) and"in attentional 
blindness" (Mack, 2003) paradigms in psychology have provided 
extensive evidence that we often fail to notice substantial changes 
in a visual scene, such as the changing color of a car or the dis- 
appearance of a plane's engines, even when asked to look for any 
changes taking place. Moreover, it appears to the participants that 
they perceive the scene as a whole without any factual gaps. While 
unusual orientations, flickering, splashes, and other artificial tech- 
niques help, they are not needed to induce these effects. Consider, 
for instance, the "gorillas in our midst" experiment (Simons and 
Chabris, 1999)^. Participants were instructed to count the number 
of basketball passes between members of one team of basketball 
players, all wearing the same-colored shirts. In the middle of the 
video a person dressed up in a full-body gorilla suit strolls right 
into the midst of the passing players. The "gorilla" stops to face the 
camera, pounds its chest, and then wanders off Incredibly, around 
50% of people fail to notice that anything out of the ordinary has 
taken place. 

Researchers also found perceptual "blindness" in real-life social 
situations (Simons and Levin, 1998). They had an assistant pre- 
tend to be lost on campus and then to approach a random passerby 
for directions and for help in using a map. In the middle of 



^The video is available here: http://www.youtube.com/watch?v=vJG698U2Mvo. 




FIGURE 1 |The "Thatcher illusion " CThompson, 1980). We first see two 
more-or-less identical faces. However, wlien they are turned around 180° to 
their proper orientation, it turns out that one face had been manipulated. 
These abnormal physical details had previously been perceptually obscured 
behind the meaningful experience of seeing a replication of the left-hand 
image. (This figure was first published in Thompson, R "Margaret Thatcher: 
a new illusion" Perception, 1980, 9, pp. 483-484, reproduced by kind 
permission of Pion Ltd, London. Website: www.perceptionweb.com). 
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this interaction two other assistants carrying a large opaque door 
rudely barged through the two interactants. During this brief 
interruption the "lost" person was quickly replaced with another 
person playing the same role, and afterwards the interaction con- 
tinued. Astonishingly, in about 50% of cases the passerby failed 
to notice that their interlocutor had been swapped for a different 
person. In one variation of the experiment the two swapped peo- 
ple were both dressed as construction workers. The researchers 
comment: "One subject who failed to detect the change essen- 
tially stated our predicted hypothesis: She said that she had just 
seen a construction worker" (Simons and Levin, 1998, p. 648). 
They concluded that, provided that the overall meaning of the 
scene is unchanged, physical changes to seen persons often escape 
awareness even when they occur during a natural, real-world 
interaction. 

These findings are surprising from the representationalist per- 
spective of the HPP, but they are in accordance with the HDP. 
Absence of physical details is to be expected if experience is primar- 
ily about situating perceivers in a meaningful and goal-directed 
relationship with their environment, for example based on our 
capacities for action (Noe, 2002). And the same applies to our 
experience of other people. For example, the person-swapping 
experiments provide empirical support for the phenomenological 
analysis by Gurwitsch ([1931] 1979) about how other people are 
often primarily perceived in terms of their social roles. Although 
more comparative research needs to be done in order to determine 
to what extent these findings can be generalized to the perceptual 
experience of non-human primates, there are no a priori reasons 
to assume that they are exempt from meaningful perception and 
its consequences. Moreover, as the next section demonstrates, 
such a generalization is theoretically supported because of the 
parsimonious account it provides for comparative research on 
imitation. 

LOGICAL CONSEQUENCES OF THE HDP REGARDING 
IMITATION 

We are now in a position to take a closer look at the logical 
consequences of the HDP for current issues in imitation research. 

HDP-(l): The intentional actions of other agents are primar- 
ily perceived in terms of their meaning and goal- 
directedness. 

HDP- (2): Given HDP-(l), the other's abstract physical motions in 
objective space are not primarily observed via perceptual 
experience. 

HDP-(3): Given HDP-(2), observation of the other's abstract 
physical motions in objective space requires additional 
cognitive effort. 

Note that HDP-(2) is a less constraining consequence than 
HPP-(2), because even though direct perception is the normal 
default mode, perception of physical details is not necessarily 
impossible altogether. But the HDP predicts that perception of 
abstract physical details of meaningful actions requires additional 
cognitive effort of some kind, such as detached observation, con- 
trolled shifts of attention, explicit analysis of isolated perceptual 
stimuli, and so forth. This has logical implications for the way in 
which observed actions can be replicated. 



HDP-(4): Given HDP-(2), an agent's replication of the observed 
behavior of another agent is primarily guided by the 
other's perceived goals. 

HDP- (5): Given HDP- (3), an agent's replication of the observed 
behavior of another agent can also be guided by the 
other's perceived physical motions, but only to the extent 
that the observer has the required additional capacities. 
We therefore arrive at the following prediction: 

HDP-(6): Given HDP-(4) and HDP-(5), copying the means of 
an observed action is cognitively more demanding than 
emulating its intended results. 

In other words, according to the HDP, because the physical 
details of an observed meaningful action tend to be obscured by 
top-down effects on perception, the default mode of replicating 
behavior is emulation. On this view, and in direct contrast to the 
traditional view of imitation (e.g., TomaseUo, 1999), understand- 
ing the other's goals and intentions does not facilitate imitation, 
but actually hinders it because such direct insight obscures the 
precise means. Imitation requires individuals to change atten- 
tion from what the other's goals are to how the other's actions 
are precisely realized, while emulation is possible without this 
extra effort. This proposal better accords with the fact that faithful 
imitation is less common in non-human primates although they 
have a range of social capacities, including goal understanding 
(TomaseUo, 2008, p. 177). 

It could be countered that neonates lack the requisite cogni- 
tive skills to control their focus of attention, and yet they are 
still able to imitate with flexibility. However, the need for higher- 
level cognition to reveal the physical details of the means of an 
observed action is premised on the fact that the action has an 
intelligible goal. But if the other's goal is completely unintelligi- 
ble, for example because it is an unfamiliar conventional practice, 
then there is no perceived meaning that could be competing per- 
ceptually and attentionally with recognition of the fine-grained 
physical means of the other's action. In other words, if the other's 
goal is not directly perceptually given, then a replication of the 
observed behavior should be more easily guided by its actual 
physical execution - without any need for extra cognitive effort. 

These considerations amount to the related hypothesis that an 
individual's propensity to emulate is dependent on the direct intel- 
ligibility of the observed action. Conversely, imitation is expected 
to be more frequent whenever the goal of the other's action is per- 
ceptually opaque. According to HDP, therefore, it is possible to 
evaluate the space of observable actions according to their poten- 
tial direct perceptual intelligibility when viewed by a conspecific. 
We propose three broad partially overlapping categories that lead 
from intelligibility to opacity: 

(1) Biologically constrained behavior (completely intelligible to 
conspecifics), 

(2) Contextually constrained behavior (partially intelligible to 
conspecifics), and 

(3) Conventionally constrained behavior (completely unintelligi- 
ble, unless seen by an enculturated group member). 

The category of "biologically constrained behavior" includes 
all communicative expressions and actions whose meaning and 
goals are constrained by species-specific biology. Understanding 
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of such actions is instinctual. For example, a wolf will never misun- 
derstand a conspecific wagging its tail. We predict that copying of 
this kind of behavior will nearly always be emulative, whUe precise 
imitation is exceedingly difficult because it requires a concerted 
effort of detachment and analysis (although in practice the lim- 
ited number of alternative means to achieve the same result may 
make this replication appear close to copying the means). 

The category of "contextuaUy constrained behavior" includes 
more ambiguous action types, because intelligibility also depends 
on the relation of the observed behavior to the observed social 
and natural context. The more species-atypical and mediated the 
action is, the less intelligible it appears. For example, when one ape 
observes another one reaching into a tree, grabbing something and 
then putting it into its mouth, this should be directly perceived as 
feeding without need for reflection. Intelligibility of this behavior 
is not simply determined by species-specific biology, but rather 
enabled by the characteristics of the situation, i.e., by directly 
observing the effects that an action has on the environment. On 
the other hand, the paradigmatic class of more mediated behaviors 
is the learned use of tools, for example when a chimpanzee uses a 
stick to fish for insects or honey (Humle etal., 2009). The way in 
which a behavior of this category will be copied depends on how 
self-evident its function is. The default mode of copying wiU still be 
emulation. However, aspects of instrumental action and tool-use 
whose intentions are obscured, including conventionally deter- 
mined norms of behavior, require closer observation and more 
imitative learning, thereby leading to the emergence and preser- 
vation of cultural traditions that are not determined by functional 
considerations alone. WhUe most pronounced in humans, such 
traditions have also been demonstrated in non-human primates, 
including chimpanzees (Whiten et al., 1999), orangutans (Krtitzen 
et al., 201 1), and capuchin monkeys (Perry, 2011). 

Uniquely, the perceived meaning of a "conventionally con- 
strained behavior" is neither constrained by an internal biological 
necessity nor by the external environment. Instead, the relation- 
ship between a behavior and its meaning is primarily based on 
a social convention that is largely arbitrary. The reasons for the 
specific form of the relationship are irrelevant and usually not 
known to the community. Although not absent in non-human 
primates, this category of behaviors is especially typical for human 
actions, including language, customs, and rituals. More recently, 
it also includes writing and complex technology. In these cases the 
default mode of copying an observed behavior wiU be imitation, 
at least for young children and other cultural outsiders, because 
its meaning is not directly intelligible without having already been 
enculturated. Our theory therefore predicts that imitation is most 
frequently done by young individuals in response to the perception 
of unfamiliar social behavior while developing in a richly symbolic 
culture. 

We also predict that the more an action is conventionally con- 
strained, the more it will be faithfully imitated by group members, 
thereby restricting variation in performances of that action within 
the community. For example, in the case of chimpanzees we there- 
fore expect there to be more variation in using a stick to fish for 
termites (a mostly contextuaUy constrained action) in contrast 
to the hand-clasp interaction during mutual grooming (a mostly 
conventionally determined action). However, this prediction is 



also age-dependent. During enculturation imitation is eventually 
replaced by emulation, because as the meaning of conventionally 
constrained actions is learned, their meaning will become per- 
ceptually transparent just at the same time as the perception of 
their underlying physical means wiU require increasing effort. As 
we know in the case of humans, the perception of certain physi- 
cal details will eventually become nearly impossible for adults, for 
example the distinction between the sounds of the English "1" and 
"r" by adult Japanese speakers. 

Regarding non-human primates there is a classic experiment 
conducted by Tomasello etal. (1997), which investigated imita- 
tion in response to a novel arbitrary social gesture that did not 
involve tool-use in two groups of captive chimpanzees. Since the 
outcome of this experiment at first sight appears to contradict our 
theory, we will consider it in more detail here. On three occasions 
the experimenters temporarily separated a dominant female chim- 
panzee from her group to extensively train her alone to perform an 
arbitrary gesture to receive treats. After the trained chimpanzees 
were returned to their groups, they were called to the fence where 
they spontaneously began to perform the learned gesture, and 
thus received the coveted treats in full view of the other members 
of their group. The other members were highly motivated to get 
their own treats, but they performed their usual begging gestures 
to the experimenters; none of them attempted to imitate the new 
gestures of the trained chimpanzees. 

However, worries have already been raised about the possibility 
that negative findings derived from experimental paradigms using 
food-related actions might fail to generalize to the imitation of 
social conventions (Watson and Caldwell, 2009). Relatedly, from 
the perspective of our theory, the main problem with this partic- 
ular study is that it tried to replace an existing instinctive gesture 
with a novel gesture within a highly familiar context, i.e., begging 
from humans. The highly familiar situation of food-procurement 
enables chimpanzees to directly understand a conspecific's begging 
action as such in a contextuaUy enabled manner, no matter that it 
is performed differently. This social understanding of the other's 
goal, according to our theory, makes emulation the more likely 
response. Moreover, there already existed an instinctual response, 
the open-hand begging gesture, which is universally understood 
by humans and chimpanzees alike. Indeed, the experimenters 
recorded these normal begging gestures, showing that there was no 
communication problem as such. The chimpanzees understood 
what the trained chimpanzees were doing, i.e., begging for food, 
and they also understood that the experimenters understood what 
they themselves were doing, i.e., begging as well. Given this general 
understanding of the situation by the observing chimpanzees, our 
theory predicts correctly that the physical detaUs of the trained 
gestures were obscured behind a direct perception of the other's 
goals, thereby leading to emulation. 

Note that the two theories of perception make differing pre- 
dictions regarding the role of perception and inteUigibUity for 
imitation. First, whereas the HPP entails that the understand- 
ing of a perceived action is a secondary, independent cognitive 
process, the HDP entails that understanding cannot be easily sep- 
arated from perception itself As we have shown, there is a growing 
body of evidence to support the latter assertion. In terms of mea- 
surable effects on imitation, there is at least one study of that 
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directly manipulated perceptual experience, and the results are 
better accommodated by the HDP. It was found that perceptual 
salience of the demonstrated behavior, as varied by familiarization 
and distractors, modulates infants' propensity for rational imita- 
tion (Beisert etal., 2012). For instance, when an adult operated a 
light switch using their head while their hands were visibly unoc- 
cupied, 10 out of 14 (73.3%) infants imitated the action, which 
seems like a rational choice because there could have been an 
important reason for avoiding the easier option of using the hands. 
But when the experiment was repeated with two distracting smi- 
ley faces placed on the table, one next to each of the unoccupied 
hands, the tendency for such "rational imitation" was attenuated: 
although the rational choice should have been the same as before, 
only 8 out of 15 infants imitated the action (53.3%; Beisert etal., 
2012, p. 3). It is therefore possible that some negative results of imi- 
tation in non-human primates also suffered from the perceptual 
effects of distraction and lack of salience. 

Second, whereas the HPP entails that the physical details 
of the world are just as easily perceived no matter their intel- 
ligibility, because intelligibility is a later stage of inferential 
processing, the HDP entails that physical details and intel- 
ligibility are in conflict with each other. To be sure, both 
the HPP and the HDP are compatible with findings showing 
that diminished intelligibility is positively correlated with imi- 
tation. But whereas the HDP leads us to the prediction that 
imitative responses become less frequent with increased intelli- 
gibility, the HPP is neutral on this point or even predicts the 
opposite, e.g., the traditional hypothesis that humans have a 
greater ability for imitation compared to chimpanzees because 
humans are better at making sense of others' goals (i.e., only 
they have a "theory of mind"). It is therefore also possible 
to arbitrate between these theories by evaluating the extent to 
which the absence of imitation is related to the presence of 
intelligibility. 

THEORIES OF THE DEVELOPMENT AND EVOLUTION OF 
IMITATION 

We finish by contrasting the consequences of the HPP and HDP in 
terms of theories of the development and evolution of imitation. 

EXPLAINING THE DEVELOPMENT OF IMITATION 

Imitation starts from the first moments of life. For example, 
it has been found that human neonates can imitate a variety 
of arbitrary facial gestures (Meltzoff and Moore, 1977), but so 
can chimpanzee neonates (Myowa-Yamakoshi, 2006; Bard, 2007) 
and even macaque neonates (Ferrari etal., 2006). Delayed imita- 
tion studies with 1 -week-old rhesus macaques suggest that these 
are not mere reflexes, but are to some extent under intentional 
control (Paukner etal., 2011). This prevalence of early imita- 
tion is what we would predict given that young infants do not 
yet have well-developed social understanding of even the most 
basic gestures, and are thus more likely to perceive physical 
features. Conversely, it is expected that this precise imitation 
decreases as basic social competence increases, because the devel- 
opment of categorical perception and social understanding will 
start to obscure the physical details of what is directly per- 
ceived as meaningful. And, indeed, it is well known that human 



neonatal imitation disappears after 2-3 months of age (lones, 
2009), and a similar developmental trend exists for chimpanzee 
neonates: 

At less than 7 days of age, the chimpanzees could discriminate between 
and imitate several human facial gestures. However, by the time they 
were 2 months old, the chimpanzees no longer imitated the gestures. 
They began to perform the mouth pen (MO) gesture frequently in 
response to any of the three facial gestures presented to them. This 
response could be considered as "social smiling" (i.e., play face) directed 
at the human experimenter (Myowa-Yamakoshi, 2006, pp. 223-224). 

Myowa-Yamakoshi (2006) also noted that the disappearance 
of facial imitation might be related to social-interactive responses 
toward the experimenter. We agree and suggest that the physi- 
cal realization of the facial gesture, i.e., face with/without tongue 
and/or lip protrusion, has become perceptually obscured by the 
acquired basic social understanding. The otherwise puzzling dis- 
appearance of neonatal imitation in human and non-human 
primates can therefore be understood in terms of the devel- 
opment of a basic, still largely biologically constrained, social 
competence. 

Young humans' propensities for faithful imitation of arbi- 
trary gestures reliably reappear after 1.5-2 years of age (lones, 
2009). Infants' over-imitation of tool-related actions, even of evi- 
dently unsuccessful ones, increases with age, becoming the default 
response after around 2 years of age (Nielsen, 2006). This reap- 
pearance of imitation could reflect a new social learning process 
that specifically responds to the unintelligibility of conventionally 
constrained behavior, such as culturally mediated social interac- 
tion. Relatedly, we expect that emulation will once again become 
the default mode of imitation in adults, because most convention- 
ally constrained actions will by then have become perceptually 
intelligible, thereby once again obscuring the underlying means. 
Indeed, in Horowitz's (2003) study of adult human imitation the 
details of the demonstrator's motions apparently remained out- 
side of the observer's perceptual focus, as confirmed by anecdotal 
reports: "In casual conversation during the debriefing period, 1 
subject remarked after hearing that the experiment gauged her 
level of imitation 'Oh, you mean when I saw you messing with 
the box, if I imitate thatV " (Horowitz, 2003, p. 333). We suggest 
that the participants primarily saw the demonstration as a general 
"messing about" whose physical details were perceptually obscured 
by the self-evident goal of opening the device. 

Interestingly, a similar later developmental trend has been 
observed in the case of chimpanzees. After the disappearance of 
neonatal imitation, a reemergence of precise imitation has been 
observed to occur around 9 months of age (Myowa-Yamakoshi, 
2006). In one field study several young chimpanzees, but none of 
the adults, were documented to imitate the idiosyncratic actions 
of a disabled adult chimpanzee (Hobaiter and Byrne, 2010). 
Thus, at some point the propensity for imitation in young chim- 
panzees decreases once again, as is also demonstrated by a host of 
experiments involving captive adult chimpanzees (e.g., Tomasello 
etal, 1987, 1997; Nagell etal., 1993; Bjorklund etal, 2002). This 
broad similarity to the non-linear development of imitation in 
young humans suggests that juvenile chimpanzees may also aim 
to acquire the conventionally determined behavior of their group. 
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EXPLAINING THE EVOLUTION OF IMITATION 

One popular hypothesis is that human imitation first emerged 
because of a necessity for young individuals to learn complex 
tool-making techniques (Csibra and Gergely, 2006). The main 
idea is that humans are more prone to imitation because natural 
selection honed them to focus their attention on others' complex 
tool-related actions, rather than just their goals or effects on the 
environment (Tomasello, 2008, pp. 208-209). At the same time 
it is recognized that the success of imitative learning depends 
not so much on slavishly copying the others' movements, but 
also on a hierarchical analysis of overall goals and plans leading 
to "program-level" imitation (see also, e.g., Byrne and Russon, 
1998; Tomasello etal., 2005). On this view, faithful imitation 
was only later adapted for imitating socially determined behavior 
(Tomasello et al, 2005, p. 687). 

We agree that successful imitation depends on learning to refo- 
cus attention to specific aspects of observed actions, although our 
account differs slightly. Evidence for so-called "program-level imi- 
tation" (Byrne, 2003) fits with the idea that observers first perceive 
the other's general intention, while refocusing on the physical 
details of the component movements requires additional effort. 
Nevertheless, the hypothesis that precise imitation in humans 
evolved specifically because of the need to copy complex tool-use 
does not sit easily with the experimental evidence. Over- imitation 
by children and under-imitation by adults are puzzling phenom- 
ena if precise copying of tool-based functionality was the primary 
evolutionary pressure for human imitation. 

There is another issue with the hypothesis of tool-related ori- 
gins of imitation, which is the tendency of overestimating the 
opacity of observed tool-use behavior. Apart from complex mod- 
ern technology, most learning of new tool-use practices can be 
guided by close observation and practice, as demonstrated by 
young chimpanzees in the wild (Inoue-Nakamura and Matsuzawa, 
1997; Biro et al., 2006). However, no matter how many times you 
say "bring me that ball" to a pre-linguistic infant, the meaning 
of this communicative action will remain elusive unless it is tied 
into a pragmatic context involving the speaker, the listener, and a 
ball (Tomasello etal., 1993). We therefore agree with Csibra and 
Gergely (2009) that the acquisition of human cultural practices is 
facilitated by specialized gestures, such as faithful imitation and 
ostensive signals. However, on our view, they misjudge what pre- 
cisely is special about human culture by accepting the HPP as their 
starting point. 

[T]o acquire the relevant knowledge through observation sets an 
ill-posed inverse problem: a behavior can always be generated and 
explained by an infinite number of different mental state combina- 
tions, representing diverse goals and/or different types of background 
knowledge. This difficulty is just multiplied when observing mediated 
(recursive) tool use [e.g. when one tool is used to produce another 
tool], where no perceptible reward would inform the observer about 
the tool's function and, in the absence of that, there is no way to assess 
the relevance of any element of the behavior observed. (Csibra and 
Gergely, 2006, p. 252.) 

The HPP commits Csibra and Gergely to the questionable 
claim that, from the point of view of an external observer, any 
behavior could be caused by an "infinite" number of mental states 



because nothing but the physical states of an action are observ- 
able. However, according to the HDP, such an absolute "inverse 
problem" normally does not exist in practice, even when observ- 
ing recursive tool-use. Csibra and Gergely illustrate the concept of 
recursive tool-use by contrasting a child's observation of someone 
using a tool to peel away the hard skin of a fruit (presumably to 
eat its interior) compared to the child observing someone using 
a tool to carve away bits of a piece of wood (presumably to make 
a pointy spear). We agree that the latter, recursive action would 
be less intelligible than the former, but many of its aspects would 
still be sufficiently contextually constrained to be intelligible for 
the chUd. For example, the wood carver's attention will be focused 
on the shape of the tip (and not on the flakes falling down or the 
sounds that are made); he may look at it, feel it with his fingers, 
clean away bits that get stuck, etc. He may also throw the spear 
at some target to check its effectiveness, and if not satisfied, con- 
tinue carving some more. Once done, he will take the resulting 
spear on the hunt where its utility in killing prey will be put to 
the test; if it happens to break, he may carve a new tip. In other 
words, the meaning of the tool-based making of this tool is largely 
intelligible because it is embedded in contextually constrained 
practices. 

To be sure, Csibra and Gergely 's general hypothesis that imita- 
tion is related to the copying of unintelligible behaviors matches 
our own theory (see also Gergely and Csibra, 2006), but their 
commitment to the HPP prevents them from appreciating the 
qualitative difference between observing tool-use (even of the 
recursive kind) and observing actions that are based entirely on 
social conventions. We argue that it is only when the child is 
observing an unfamiliar conventionally constrained behavior that 
she is in fact confronted by genuine opacity. This is why over- 
imitation, as well as pedagogy (Csibra, 2007), are observed mainly 
in humans: our survival and success depends on learning social 
conventions. Young non-human primates are also keen to learn 
tool-based actions from adults and exhibit some imitative learn- 
ing. For example, young chimpanzees engage in long periods of 
observational learning of cracking nuts by using an anvQ and ham- 
mer stone (Biro etal., 2006), and young capuchin monkeys have 
also shown to exhibit observational learning of tool-use (Fredman 
and Whiten, 2008). But, as far as we know, young non-human 
primates in the wild, in the absence of extensive symbolically medi- 
ated social conventions like those characterizing humans, have to 
learn only a limited number of arbitrary social conventions in 
order to become successful group members, and the opacity of 
others' actions is therefore more manageable. 

The unique perceptual opacity of unfamiliar conventionally 
constrained behavior also helps to clarify the relationship between 
imitation and social conformity. It has been hypothesized that 
the higher prevalence of imitation in humans could derive from 
their need to enact an appropriate sociocultural identity so as 
to become an accepted member of their social group. On this 
view, human imitation is premised on a shared social contract, 
whereas "there is no good evidence that apes imitate others only 
for social conformity and/or solidarity" (Tomasello, 2008, p. 213). 
We agree with the idea that imitation helps young humans to 
develop into successful members of their group, but we argue that 
this phenomenon is not strictly limited to humans. 
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For example, an experiment with captive chimpanzees has 
found that adults tend to conform to norms of tool-use even if 
other styles of usage are known (Whiten etal, 2005a). That this 
social conformity is partially related to affirming group member- 
ship cannot be ruled out. For instance, chimpanzee neonates are 
more likely to imitate in a communicative situation (Bard, 2007), 
and young chimpanzees rarely imitate facial gestures in the absence 
of ongoing bodily contact with the demonstrator, which sug- 
gests that social bonding is an essential element of their imitation 
(Myowa-Yamakoshi, 2006). Solidarity is also observed. For exam- 
ple, adult chimpanzees help each other upon request even when 
there is no immediate possibility for reciprocity (Yamamoto et al., 
2009), and they console victims of buUying (Fraser etal., 2008). 
Adult bonobos collaboratively share food in the wUd (Savage- 
Rumbaugh etal, 1998, pp. 219-225). These findings are not 
restricted to apes. Conformity to social norms of food prefer- 
ence is documented for wild vervet monkeys (van der Waal etal., 
2013), and imitation has been shown to enhance social bonding 
in capuchin monkeys (Paukner et al, 2009). However, despite this 
growing evidence of social conformity and solidarity, there is little 
evidence of over-imitation in non-human primates. 

These considerations lead us to formulate a revised hypoth- 
esis regarding the origin of imitation in humans. We agree with 
TomaseUo (2008) that the greater prevalence of imitation in young 
humans is related to the formation of a shared social contract, but 
we specify that this is because the human social contract is defined 
by conventionally constrained practices. Our social identities are 
largely constituted by the arbitrary symbol systems in which we 
grow up. From this perspective it is no longer surprising that 
human children are especially prone to over- imitation. Their best 
bet of becoming enculturated into their social group is by focus- 
ing their imitative learning on opaque actions whose function 
does not appear contextually constrained. Over- imitation is there- 
fore far from being surprisingly unselective. It is a highly selective 
behavior that picks out precisely those aspects of social interaction 
that are most likely part of conventionally constrained practices. In 
other words, over-imitation is an effective developmental response 
to becoming enculturated in a symbolic culture. Similarly, encul- 
turated apes - apes who were cross-fostered by humans and who 
have been embedded in richly symbolic environments - imitate 
arbitrary actions more readily than do other institutionalized apes 
(Custance etal, 1995; Rumbaugh etal, 2008). 

CONCLUDING REMARKS 

All theories of imitation agree that the more conventionalized 
social interactions are, the more imitative social learning is 
required to become a successful member of the group. However, 
theories based on the HPP and our theory based on the HDP 
differ in one crucial respect: the former are united by the assump- 
tions that physical details are always and exclusively perceived, 
and that this perceptual access is independent of any subsequently 
realized intelligibility. Our proposal proceeds on experimental 
and phenomenological insights that the direct perception of an 
action's goal and meaning is primary, and that there is therefore a 
conflict between perception of physical details and their intelligi- 
bility. Perception and intelligibility are joined together in a process 
of sense-making that accords primacy to meaning over physical 



details whenever this is possible, and if not counteracted by extra 
cognitive effort. Replacing the HPP with the HDP leads us to make 
several predictions that better fit the data. 

We expect that an individual developing in a highly con- 
ventional cultural context will be more prone to faithfully imi- 
tate, as most theories of imitation do; but we explain this 
insight more consistently than other theories by emphasizing 
that conventionally constrained behaviors are uniquely opaque 
to young and uninitiated observers when compared to other 
types of action. We go against existing theories by predicting 
that an individual's propensity to imitate is inversely correlated 
with her development and enculturation, as indexed for example 
by age and social competence. Rather than facilitating imita- 
tion, the acquisition of social understanding grants the observer 
direct perceptual intelligibility of others' behaviors, such that 
emulation will ultimately become the default mode of copying 
behavior in adulthood. Faithful imitation, on the other hand, 
will increasingly require additional effort because the underlying 
physical details of others' expressions will become perceptually 
obscured behind their directly understood significance. The pro- 
posed inverse correlation between direct perception and faithful 
imitation also explains why emulation is comparatively more fre- 
quent in non-human primates, since most of their behaviors 
are instinctively and contextually intelligible while only a few 
behaviors are conventionally constrained. In this way the HDP 
is able to explain a wider range of data in a more parsimonious 
manner. 
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